Data cleaning: tidying column names
janitor::clean_names() to column names
automatically.rename() and rename_with() to
clean column names manually.In R, column names are the “header” or “top” value of a column. They are used to refer to columns in the code, and serve as a default label in figures. They should have “clean”, standardized syntax so that we can work with them, and so that our code can be readable to other coders.
Ideally, column names:
In this lesson, we will explore how to clean column names manually and automatically in R.
As a reminder, the data we are cleaning is the data from the COVID-19 serological survey conducted in Yaounde, Cameroon.
yaounde <- read_csv(here::here('ch02_data_cleaning_pipeline/data/yaounde_data.csv'))We can use the names() function from base R. Or return
to this chapter’s intro to have a look at the output of
skim()
names(yaounde)## [1] "id ind" "AGE"
## [3] "AGE.CATEGORY" "SEX"
## [5] "EDUCATION" "OCCUPATION"
## [7] "weight kg" "height cm"
## [9] "is.smoker" "is.pregnant"
## [11] "is.medicated" "household with_children"
## [13] "breadwinner" "source of_revenue"
## [15] "has contact_COVID" "igg.result"
## [17] "igm result" "symptoms.."
## [19] "consultation" "treatment..combinations"
## [21] "drugsource" "hospitalised"
## [23] "sequelae" "respiration frequency."
## [25] "is drug_parac" "is drug_antibio"
## [27] "is drug_hydrocortisone" "is drug_other_anti_inflam"
## [29] "is drug_antiviral" "is drug_chloro"
## [31] "is drug_tradn" "is drug_oxygen"
## [33] "is drug_other" "is drug_no_resp"
## [35] "is drug_none" "NA"
We can see that:
..typhoid dataset.janitor::clean_names()A handy function for standardizing column names is the
clean_names() from the {janitor} package.